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ABSTRACT 

The 19th annual Database Issue of Nucleic Acids 
Research features descriptions of 92 new online 
databases covering various areas of molecular 
biology and 100 papers describing recent updates 
to the databases previously described in NAR 
and other journals. The highlights of this issue 
include, among others, a description of neXtProt, 
a knowledgebase on human proteins; a detailed 
explanation of the principles behind the NCBI 
Taxonomy Database; NCBI and EBI papers on the 
recently launched BioSample databases that store 
sample information for a variety of database re- 
sources; descriptions of the recent developments 
in the Gene Ontology and UniProt Gene Ontology 
Annotation projects; updates on Pfam, SMART and 
InterPro domain databases; update papers on KEGG 
and TAIR, two universally acclaimed databases that 
face an uncertain future; and a separate section 
with 10 wiki-based databases, introduced in an 
accompanying editorial. The NAR online Molecular 
Biology Database Collection, available at http:// 
www.oxfordjournals.org/nar/database/a/, has been 
updated and now lists 1380 databases. Brief 
machine-readable descriptions of the databases 
featured in this issue, according to the BioDBcore 
standards, will be provided at the http://biosharing 
.org/biodbcore web site. The full content of the 
Database Issue is freely available online on the 
Nucleic Acids Research web site (http://nar 
.oxfordjournals.org/). 



COMMENTARY 

This current, 19th annual Database Issue of Nucleic Acids 
Research (NAR) features descriptions of 92 new online 
databases covering a variety of molecular biology data, 
77 update papers on databases that have been previously 
described in the NAR Database Issue and 23 papers with 
updates on database resources whose descriptions have 
previously been published in other journals (Table 1). 
The accompanying NAR online Molecular Biology 
Database Collection (http://www.oxfordjournals.org/nar/ 
database/a/) has been revised, which resulted in updating 
the URLs of more than 30 databases and exclusion of 
more than 20 obsolete web sites. This list now includes 
1380 databases sorted into 14 categories and 41 
subcategories. 

NEW AND UPDATED DATABASES 

This issue contains an unusually high number of papers 
from the authors' host institutions, NCBI and EMBL- 
EBI, respectively. In addition to the annual papers from 
the International Nucleotide Sequence Database collabor- 
ation [INSDC (1), which includes the DNA Data Bank of 
Japan, GenBank and the European Nucleotide Archive 
(2-4)], Ensembl (5), UniProtKB (6) and the Protein 
Data Bank in Europe (7), these include two papers that 
describe the BioSample database project, recently 
launched at both institutions. The BioSample databases 
[http://www.ncbi.nlm.nih.gov/biosample and http://www 
.ebi.ac.uk/biosamples/, (8) and (9), respectively] aim at 
capturing essential information about each biological 
sample used to obtain sequence, gene expression or 
protein expression data, as well as the relationship 
between different samples and their sources. The sample 
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Table 1. New databases featured in the 2012 NAR Database issue 



Database name 



URL 



Brief description 



ApoHoloDB h 

AutismKB h 

BGMUT h 

BitterDB h 

canSAR h 

CAPS-DB h 

ccPDB h 

CharProtDB h 

COLT-Cancer h 
Crystallography Open Database h 

Cube-DB h 

DARC h 

DBETH h 

Death Domain database h 

DIGIT h 

Disease Ontology h 

DiseaseMeth h 

DistiLD h 

DNAtraffic h 

DOMMINO h 

doRiNA h 

DR.VIS h 

EBI BioSample Database h 

EcoliWiki h 

eQuilibrator h 

FungiDB h 

FunTree h 

GeneWeaver h 

GONUTS h 
GWASdb 

HaploReg h 

HFV database h 

hiPathDB h 

Histome h 

HotRegion h 
Human OligoGenome Resource h 

ICEberg h 

IDEAL h 

IGDB.NSCLC 

IndelFR h 

InterEvol h 

LegumellP h 

MetaBase h 

MethylomeDB h 

MINAS h 

MIPModDB h 

miREX h 

miRNEST h 

MMMDB h 

modMine h 

MOPED h 

NCBI BioSample h 

NCBI BioProject h 

Nematodes.org h 

Newt-omics h 

neXtProt h 



p://ahdb.ee.ncku.edu.tw/ 
p://autism.cbi. pku.edu.cn 

://www. ncbi.nlm.nih.gov/projects/gv/mhc/ 
xslcgi.cgi?cmd = bgmut 

p://bitterdb. agri.huji.ac.il/bitterdb/dbbitter.php 
p://cansar. icr.ac.uk 

p://www. bioinsilico.org/CAPSDB 
p://crdd. osdd.net/raghava/ccpdb/ 

p://www jcvi.org/charprotdb/ 

p://colt.ccbr.utoronto.ca/cancer 

p://www.crystallography.net/ 

p://epsf.bmad.bii.a-star.edu.sg/cube/db/html/home 

html 

p://darcsite. genzentrum.lmu.de/darc/ 
p://www.hpppi.iicb.res.in/btox 
://www. deathdomain.org 

p://www. biocomputing.it/digit4/ 

p://diseaseontology. sf.net/ 

p://202.97.205.78/diseasemeth 

p://distild.jensenlab.org/ 

p : //dnat raffle . ibb . waw . pi/ 

p://dommino.org 

p://dorina.mdc-berlin.de 

p://www. scbit.org/dbmi/drvis 
p://www. ebi.ac.uk/biosamples/ 

p://ecoliwiki.net 

p://eq uilibrator.weizmann.ac.il 

p://fungidb.org 

p://www. ebi.ac.uk/thornton-srv/databases/FunTree/ 

p://www. GeneWeaver.org 

p://gowiki. tamu.edu 

p:/ /jjwanglab.org/gwasdb 

p://compbio.mit.edu/HaploReg 

p://hfv. lanl.gov/ 

p:/ /hipathdb.kobic.re.kr/ 

p://www. histome. net/ 

p://prism.ccbb.ku.edu.tr/hotregion 

p://oligogenome. stanford.edu/ 

p://db-mml.sjtu. edu.cn/ICEberg/ 

p://www. ideal. force. cs.is.nagoya-u.ac.jp/IDEAL/ 

p://igdb.nsclc.ibms.sinica.edu.tw 

p://indel. bioinfo.sdu.edu.cn 
p:/ /biodev.cea.fr/interevol 
p://plantgrn.noble.org/LegumeIP/ 
p://metada tabase.org 

p://epigenomics.columbia.edu/methylomedb/ 

p://www.minas.uzh.ch 

p://bioinfo.iitk.ac.in/MIPModDB 

p://bioinfo. amu.edu.pl/mirex 

p://mi rnest.amu.edu.pl 

p://mmdb. iab.keio.ac.jp/ 

p://intermine. modencode.org 

p://moped. proteinspire.org 

p://www. ncbi.nlm.nih.gov/biosample 



//www.ncbi.nlm.nih.gov/bioproject 
//www. nematodes.org/nematodegenomes/ 
//newt-omics. mpi-bn. mpg.de 
//www. nextprot.org/ 



Apo- and Holo- structure pairs of proteins 

Autism genetics knowledgebase 

Blood Group antigen gene Mutation database 

Bitter taste: molecules and receptors 
Integrated cancer research and drug discovery 
resource 

Classification of helix cappings in protein structures 
Compilation and creation of datasets from Protein 
Data Bank 

Experimentally Characterized Protein annotations 
Essential gene profiles in human cancer cell lines 
Crystal structures of small molecules 
Functional divergence in human protein families 

Database for Aligned Ribosomal Complexes 
Database for Bacterial ExoToxins for Humans 
Protein interaction data for Death Domain 
superfamily 

Database of ImmunoGlobulin sequences and 

Integrated Tools 
Ontology for a variety of human diseases 
Human disease methylation database 
Diseases and Traits In Linkage Disequilibrium blocks 
DNA dynamics during the cell cycle 
Database of MacroMolecular INteractions 
Database of RNA interactions in post-transcriptional 

regulation 

Human Disease-Related Viral Integration Sites 
Biological samples used as sources of sequence, 

structure or expression data 
Community-based pages about non-pathogenic E. coli 
Thermodynamics calculator for biochemical reactions 
Functional genomics of fungi 
Evolution of novel enzyme functions in enzyme 

superfamilies 
Functional genomics analysis system 
Gene Ontology Normal Usage Tracking System 
Human genetic variants identified by genome wide 

association studies 
SNP-centric access to chromatin state information 
Hemorrhagic fever virus sequence database 
Human Integrated Pathway Database 
Human histone database 
Database of interaction Hotspots 
Oligonucleotides for targeted resequencing of the 

human genome 
Integrative and Conjugative Elements in Bacteria 
Intrinsically Disordered proteins with Extensive 

Annotations and Literature 
Integrated Genomic Database of Non-Small Cell 

Lung Cancer 
Indel Flanking Region database 
Evolution of protein-protein Interfaces 
Model Legumes Integrative database Platform 
Wiki database of biological databases 
DNA methylation profiles in human and mouse brain 
Metal Ions in Nucleic AcidS 
Major Intrinsic Protein superfamily Models 
Plant microRNA Expression data 
microRNAs in animal and plant EST sequences 
Mouse Multiple Tissue Metabolomics Database 
Mining of modENCODE data 
Model Organism Protein Expression Database 
Biological samples used as sources of sequence, 

structure or expression data 
Linked data related to a single research project 
Wiki for coordinating nematode sequencing projects 
Data on red spotted newt Notophthalmus viridescens 
A knowledgebase for human proteins 



(continued) 
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Table 1. Continued 



Database name 



URL 



Brief description 



NRG-CING 

OGEE 
PDBj 
PhenoM 
Phytozome 
PlantNATsDB 
Polbase 

PomBase 
PoSSuM 

Predictive Networks 

ProGlycProt 

ProOpDB 
ProPortal 
ProRepeat 
ProtChemSI 
PSCDB 
RecountDB 
Rhea 

RNA CoSSMos 
ScerTF 

SCRIPDB 

SEQanswers 
SitEx 
SNPedia 
SphceDisease 

STAP refinement of NMRdb 
Stem Cell Discovery Engine 
TopFIND 

UMD-BRCAl/ BRCA2 databases 
UniPathway 

VIRsiRNAdb 
YeTFaSCo 

YMDB 

zfishbook 



http://nmr.cmbi.ru.nl/NRG-CING 



http 
http 
http 
http 
http 
http 



//ogeedb.embl.de 
//pdbj.org/ 

//phenom.ccbr.utoronto.ca 
//www.phytozome.net/ 
//bis. zju.edu.cn/pnatdb/ 
//polbase. neb.com 



http://www.pombase.org/ 
http://possum.cbrc.jp/PoSSuM/ 

http://predictivenetworks.org 

http://www.proglycprot.org 

http://operons.ibt.unam.mx/OperonPredictor/ 

http://proportal.mit.edu/ 

http://prorepeat.bioinformatics.nl/ 

http://pcidb.russelllab.org/ 

http://idp 1 .force.cs.is.nagoya-u.ac.jp/pscdb/ 

http://recountdb.cbrc.jp 

http://www.ebi.ac.uk/rhea/ 

http://cossmos.slu.edu 

http://ural.wustl.edu/TFDB/ 



http 
http 
http 
http 
http 
http 
http 
http 
http 
http 



//dcv.uhnres.utoronto.ca/SCRIPDB/search 

//seqanswers.com/wiki/SEQanswers 

//www-bionet. sscc.ru/sitex/ 

//www. SNPedia.com 

//cmbi. bjmu.edu.cn/Sdisease 

//psb.kobic.re.kr/STAP/refinement 

//discovery.hsci.harvard.edu/ 

//clipserve.clip.ubc.ca/topfind 

//www.umd.be/BRCAl/ 

//www.grenoble. prabi.fr/obiwarehouse/unipathway 



http://crdd.osdd.net/servers/virsirnadb 
http://yetfasco.ccbr.utoronto.ca/ 

http://www.ymdb.ca 
http://zfishbook.org/ 



Validated NMR structures of proteins and nucleic 
acid 

Online GEne Essentiality database 
Protein Data Bank Japan 

Morphological database of essential yeast genes 
JGFs platform for green plant genomics 
Plant natural antisense transcripts 
Biochemical, genetic, and structural information 

about DNA polymerases 
Genome database on S. pombe 
Ligand-binding POcket Similarity Search Using 

Multiple-Sketches 
Integration, navigation, visualization, and analysis of 

gene interaction networks 
Experimentally characterized Prokaryotic 

Glycoproteins 
Prokaryotic Operon DataBase 
Prochlorococcus marinus and its phages 
Amino acid tandem Repeats in Proteins 
Protein-Chemical Structural Interactions 
Protein Structural Change upon ligand binding 
Recalculated transcript amounts database 
EBI's biochemical reaction database 
RNA Characterization of Secondary Structure Motifs 
Binding sites for Saccharomyces cerevisiae 

Transcription Factors 
Search for Chemicals and Reactions In Patents 
Wiki on all aspects of next-generation genomics 
Projections of protein functional Sites on Exons 
Wiki on SNPs and genome annotation 
Links between RNA splicing and disease 
Refined solution NMR structures 
Comparison system for cancer stem cell analysis 
Protein N- and C-termini and protease processing 
BRCA1 and BRCA2 mutations detected in France 
Metabolic pathway information in UniProt knowledge 

base 

Experimentally validated Viral siRNA/shRNA 
Yeast Transcription Factor binding Site sequence 

Collection 
Yeast Metabolome Database 
Transposon-labeled mutants in zebrafish 



information includes the name of the source organism (or 
an environmental isolate), the source material within that 
species such as e.g. the organ, tissue and the cell type. It 
will also contain information about the isolation source of 
the sample, (some or all of) locality, host, collection date, 
etc. For human sources, BioSample information will 
include any available — and ethically appropriate — add- 
itional data, such as the disease state and clinical informa- 
tion [clinical samples that may raise privacy concerns will 
continue to be kept at the NCBFs dbGaP database (10) 
and the EBI's European Genome-phenome Archive 
(http://www.ebi.ac.uk/ega/), with sanitized versions avail- 
able in the BioSample databases]. While providing sample 
information will place additional burden on the submit- 
ters, the availability of BioSample data should dramatic- 
ally improve the experience of a typical user. By 
consistently recording sample information for various 
kinds of data stored in the NCBI and EBI databases, 
the BioSample databases will allow smooth cross-database 
searching of all available information pertaining to a 



particular sample source, such as cell type, disease, or 
a tissue biopsy. Furthermore, since NCBI and EBI 
agreed to assign shared sample accession numbers, these 
numbers could now be used to query web sites of both 
institutions (8,9). 

Fhe NCBI paper (8) also presents the BioProject 
database (http://www.ncbi.nlm.nih.gov/bioproject), 
another INSDC initiative, which aims to provide a 
higher-order organization of large-scale data submitted 
by a single organization or a consortium, funded from a 
single source, or relating to the same whole-genome 
assembly. Again, the availability of such metadata 
should simplify the task of retrieving related data sets 
from different kinds of databases held at NCBI, EBI 
and DDBJ. 

Five papers in this issue describe databases resources of 
the US Department of Energy's Joint Genome Institute 
(JGI, http://www.jgi.doe.gov). These include a description 
of the JGI Genome Portal (11) with its fungal 
(MycoCosm), plant (Phytozome), prokaryotic (IMG) 
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Table 2. Database updates new for the NAR Database issue 



Database name 



URL 



Brief description 



BYKdb 

BuG@Sbase 

ChEMBL 

ConoServer 

CoryneRegNet 

ExoCarta 

FunCoup 

HmtDB 

MimoDB 

MIRIAM Registry 

MitoMiner 

MitoZoa 

NAPP 

OPMdb 

PhosphoSItePlus 
PINA 

Plant Metabolomics 

PLEXdb 

Pocketome 

SABIO-RK 

SubtiWiki 

TDR Targets 

WikiPathways 



http://bykdb.ibcp.fr/ 

http://bugs.sgul.ac.uk/E-BUGS-PUB 

https://www.ebi.ac.uk/chembldb 

http://www.conoserver.org/ 

http://coryneregnet.cebitec.uni-bielefeld.de/ 

http://exocarta.ludwig.edu.au 



http 
http 
http 

http 
http 
http 
http 
http 
http 
http 
http 
http 
http 
http 
http 
http 
http 



//funcoup.sbc.su.se/ 
//www. hmtdb.uniba.it/ 
//imm unet.cn/mimodb 

//www. ebi.ac.uk/miriam/ 

//mitominer. mrc-mbu.cam.ac.uk/ 

//www. caspur.it/mitozoa 

//rna.igmors.u-psud.fr/NAPP 

//opm.phar.umich.edu 

//www. phosphosite.org/ 

//cbg . garvan . uns w . edu.au/pina/ 

//plantmetabolomics. vrac.iastate.edu/ 

//www. plexdb.org 

//www. pocketome. org 

//sabiork. h-its.org/ 

//subti wiki.uni-goettingen.de/ 

//tdrtargets.org/ 

//www. wikipathways.org 



Bacterial protein tYrosine Kinase database 

Microarray datasets for microbial gene expression 

EMBL's database of bioactive drug-like small molecules 

Sequence and structures of peptides expressed by marine cone snails 

Corynebacterial Regulatory Network 

Database on exosomes, membrane vesicles of endocytic origin released by 

diverse cell types 
Networks of Functional Coupling of proteins 
Human mitochondrial genome variability 

Mimotope database, active site-mimicking peptides from phage-display 
libraries 

Minimal Information Required In the Annotation of Models 
Mitochondrial proteomics data 
Mitochondrial genomes in Metazoa 
Nucleic Acid Phylogenetic Profile database 
Orientations of Proteins in Membranes database 

Protein phosphorylation sites and other post-translational modifications 
Protein Interaction Network Analysis 
Arabidopsis metabolomics database 

Gene Expression Resources for Plants and Plant Pathogens 

Small-molecule binding pockets in the structural proteome 

System for the Analysis of Biochemical Pathways Reaction Kinetics 

Collaborative resource for the Bacillus community 

Targets against neglected tropical diseases 

Community curation of biological pathways 



and metagenomic (IMG/M) resources, and the Genomes 
Online Database (GOLD, http://www.genomesonline 
.org), which lists the ongoing genomic and metagenomic 
projects (12). 

One of the major highlights of this issue is the first de- 
scription of neXtProt, a knowledgebase on human 
proteins that has been created at the Swiss Institute of 
Bioinformatics (SIB) on the basis of the human protein 
set in the UniProtKB/Swiss-Prot and then expanded by 
including quality-assessed protein expression, localization, 
variation and proteomics data (13). Other highlights 
include CharProtDB, a database of experimentally 
characterized proteins that is used for genome annotation 
at the J. Craig Venter Institute (14); a detailed explanation 
of the basic principles behind the NCBI Taxonomy 
Database and the ways it ties together various DNA and 
protein sequence and gene expression data for all organ- 
isms and taxonomic groups represented in GenBank (15); 
the descriptions of the recent developments in the Gene 
Ontology and UniProt Gene Ontology Annotation 
projects (16,17), and updates on model organism data- 
bases SGD, MGD, FlyBase and WormBase (18-21) and 
on Pfam, SMART and InterPro domain databases 
(22-24). 

With all the diversity of the databases featured in this 
issue, the major trend appears to be an increased focus on 
small molecules (ChEMBL, PubChem, BitterDB, 
SCRIPDB, Crystallography Open Database) and related 
topics, such as properties of enzyme-catalyzed reactions 
(Rhea, MACiE, eQuilibrator, SABIO-RK), protein- 
ligand binding (Pocketome, PoSSuM, ProtChemSI, 
STITCH), and the analysis of potential drugs and drug 
targets for human disease (canSAR, DAMPD, DBETH, 
SuperTarget, TDR Targets, Therapeutic Target 



Database). As in previous years, there is a strong repre- 
sentation of structure databases, including descriptions 
of the European and Japanese Protein Data Banks 
(PDBe, PDBj), two databases of refined NMR structures 
(NRG-CING and STAP Refinement of NMR database), 
and several other databases on protein structure and 
protein-protein interactions. 

An unusually high number of databases, including 
ChEMBL, FunCoup, MitoMiner, PhosphoSitePlus, 
Pocketome, SABIO-RK and TDR Targets, are featured 
in this NAR Database Issue for the first time after having 
their descriptions published elsewhere (Table 2). All these 
databases have been available online for several years and 
have been accepted and valued by the community. 
Accordingly, they presented few, if any, problems with 
the database design, although some appeared somewhat 
less user-friendly than is required for the NAR Database 
Issue. We consider publication of these papers in the NAR 
Database Issue a continuation of our efforts to bring the 
readers the best publicly available molecular biology data- 
bases, as well as a reflection of the unique status of this 
publication that introduces the databases to a very wide 
audience. 

In response to the growing popularity of Wikipedia 
(http://www.wikipedia.org) and wiki-based approaches 
to constructing and curating biological databases, this 
issue includes a special section with 10 papers describing 
various wiki-based databases. These papers are introduced 
in an accompanying editorial by Rob Finn, Paul Gardner 
and Alex Bateman (25), whose very popular Pfam (22) and 
Rfam (26) databases successfully incorporate wiki 
elements. It could be argued that the Pfam update paper 
(22) should have been placed in that section as well. 
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SUSTAINABILITY OF BIOINFORMATICS 
DATABASES 

A joint paper in this issue from the three INSDC members 
(27) discusses the progress of the Sequence Read Archive 
(SRA, previously known as the Short Read Archive), 
however, without mentioning the controversy that sur- 
rounded the SRA in the past year. Established in 2007 
as a public repository of raw sequence data from 
next-generation sequencing platforms, SRA stores 
sequence data generated for RNA-Seq, ChlP-Seq and 
genotyping studies, as well as from several large-scale 
projects, such as the Human Microbiome project 
(https://commonfund.nih.gov/hmp) and the 1000 
Genomes project (http://www.1000genomes.org) (27). In 
June 2011, its volume surpassed 100 Terabases (10 14 
bases) of DNA. In February, NCBI announced that, 
due to budget constraints, it would discontinue the SRA 
within the next 12 months (http://www.ncbi.nlm.nih.gov/ 
About/news/16feb2011). This announcement caused a 
widespread response (28). One news source even claimed 
that NCBI 'announced that it would slowly phase out its 
DNA archive due to federal budget cuts'. There has been 
also an extensive online discussion on the http:// 
seqanswers.com wiki web site (which is described in a 
separate paper in this issue). However, the news of the 
SRA demise proved largely premature. Within days, EBI 
and DDBJ announced that they would continue 
supporting the SRA (http://www.ebi.ac.uk/ena/SRA_ 
announcement_Feb_201 1 .pdf, http://www.ddbj.nig.ac.jp/ 
whatsnew/2011/DRA20 110222.html), and the NIH 
provided support to enable the continuation of the SRA 
(http://www.ncbi.nlm.nih.gov/About/news/13Oct201 1. 
html). Still, given that the SRA keeps growing at a rapid 
pace and handling the data becomes increasingly 
complicated, the INSDC paper carefully states that 
'SRA partners actively discuss and pursue approaches 
together with user communities to maximize the benefit 
gained from archiving next-generation sequencing data 
while minimizing the infrastructure costs' (27). 

Despite its successful resolution, the SRA story high- 
lights an important problem of whether public database 
providers should try keeping all sequence-related data or 
make certain choices about the kind of resources that they 
would like to maintain. The same news release in February 
2011 announced the closure of Peptidome, the NCBI 
resource for tandem mass spectrometry peptide and 
protein identification data (29). The closure of 
Peptidome attracted far less attention than of SRA, 
probably because of the continued operation of EBI's 
PRIDE (30), Seattle Proteome Center's PeptideAtlas 
(31), the recently created MOPED (32) and other prote- 
omics resources. Still, it is definitely a sign of things to 
come, as is the recently announced closure of the 
International Protein Index, which is to be replaced by 
the complete proteome sets in UniProtKB (33). 

Most importantly, the worldwide attention to the SRA 
story illuminates the deep concern that exists in the com- 
munity with regard to the stability (viability) of the online 
databases that have become key resources enabling all 
kinds of biomedical research. Previously, we have seen a 



natural selection of databases that led to a relatively 
orderly succession: as some databases have grown 
obsolete, they were replaced by similar but more robust 
databases maintained elsewhere. For example, after 
termination of IRESdb, a database of the internal 
ribosome entry sites (34), the same data were still available 
through the IRESite database (35). Among the databases 
featured in this issue, MitoZoa provides the same coverage 
of metazoan mitochondrial genomes as the now-defunct 
AMmtDB, Gene3D fully replaces the no-longer- 
maintained 3D-Genomics, and Ensembl (5) provides the 
alternative splicing data that have previously been avail- 
able through ASHESdb, EBI's ASD/ATD/ATSD and 
several other recently discontinued databases. 

Unfortunately, owing to the difficult economic times, 
budget constraints are now leading to the termination 
(or commercialization) of truly unique resources, such as 
the Kyoto Encyclopedia of Genes and Genomes (KEGG, 
http://www.genome.jp/kegg) and The Arabidopsis 
Information Resource (TAIR, http://arabidopsis.org), 
both featured in this issue (36,37). The KEGG database, 
maintained by Minoru Kanehisa and his colleagues at the 
Bioinformatics Center of the Kyoto University Institute 
for Chemical Research, has been a permanent feature of 
the NAR Database Issue since 1997 and is now in its 60th 
release (36), see http://www.genome.jp/en/release.html. 
However, after Kanehisa, who was one of the founders 
of GenBank and has been at the forefront of bioinformat- 
ics research ever since, has reached the mandatory retire- 
ment age; the future of KEGG has suddenly become 
uncertain (see http://www.genome.jp/kegg/docs/plea 
.html). Right now, KEGG continues to be publicly avail- 
able but its funding mechanisms support a narrow focus 
on translational research (36), which is certainly important 
but is only a minor part of the enormous contribution of 
this database to the progress of genomics and bioinfor- 
matics around the world. 

The case of TAIR is even more troubling. Over the past 
12 years, TAIR enjoyed generous support from the US 
National Science Foundation (NSF, http://www.nsf.gov) 
that helped it grow into a recognized source of sequence 
data and curated annotation of the model plant 
Arabidopsis thalicma. Three previous publications on 
TAIR in the NAR Database Issue in 2001, 2003 and 
2008 were all extremely well cited, confirming the wide- 
spread use of this resource. With the completion of the 
Arabidopsis sequencing project, the focus of TAIR shifted 
from providing new annotation to improving the existing 
genome annotation, making it the ultimate source of 
gene annotation and expression data for A. thalicma. 
Unfortunately, this new focus failed to win the NSF 
support and the funding for a project that until recently 
has been heralded as one of the NSF best success stories 
will end in August of 2013. This will likely mean termin- 
ation of TAIR as we know it; the existing plans for cor- 
porate sponsorship of TAIR and/or for its shift to an 
International Arabidopsis Informatics Consortium (see 
http://www.arabidopsis.org/doc/about/tair_funding/410) 
are not going to prevent the demise of this useful genomic 
resource. 
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These recent developments show that the importance of 
the public database resources, which is obvious to any 
biologist, needs to be constantly highlighted to the 
national and international financing bodies. We all 
remember the financial difficulties encountered in the 
1990s by the Swiss-Prot database after it failed to se- 
cure sufficient support from the European Union (http:// 
web.expasy.org/docs/crisis96/help-sprot.html) (38). 
Fortunately, in the end, Swiss government recognized 
the value of that unique resource and provided funding 
to support Swiss-Prot (39). It now supports the 
UniProtKB/Swiss-Prot activities at the SIB, whereas 
funding for the UniProtKB activities at the EBI and 
PIR is provided by the NIH, NSF and the European 
Commission (6). 

The stories of Swiss-Prot, KEGG and TAIR also illus- 
trate the need [clearly articulated in a recent paper by 
Julian Parkhill, Ewan Birney and Paul Kersey, (40)] for 
a comprehensive infrastructure that would (i) support the 
key bioinformatics resources, (ii) extend to the model 
organism databases and (hi) bring the genomic informa- 
tion into every biological lab. In the USA, such infrastruc- 
ture includes the NCBI, the JGI and associated DOE labs, 
the NIH-funded Bioinformatics Resource Centers 
(this issue includes papers on VectorBase and ViPR, as 
well as on EuPathDB-associated databases, such as 
GeneDB, FungiDB, and TDR Targets) and comprehen- 
sive resources on model organisms, such as FlyBase, 
WormBase, SGD and MGD (18-21). In Europe, coord- 
ination of the bioinformatics infrastructure is planned 
through the EU-sponsored ELIXIR (European Life 
Sciences Infrastructure for Biological Information, 
http://www.elixir-europe.org) project, which aims at guar- 
anteeing seamless access to biological information by 
integrating data generators and data centers throughout 
Europe. 



AN ECOSYSTEM OF DATABASES 

Although this issue looks like a simple catalog, it is im- 
portant to note that we are not dealing with isolated re- 
sources: many listed databases interact in a variety of 
ways, forming a network of interconnected (or at least 
hyperlinked) data resources. Obviously, UniProtKB 
provides a plethora of links to all kinds of databases, 
including ENA, GenBank, DDBJ, RefSeq, PDBe, PDBj, 
IntAct, MINT, Ensembl, KEGG, UCSC Genome 
Browser, neXtProt, SGD, FlyBase, WormBase, MGD, 
TAIR, eggNOG, MetaCyc, InterPro, Gene3D, Pfam, 
SMART and ProtoNet, which are featured in this issue. 
However, many database interactions are more subtle: for 
example, BioMart has been recently used to link protein 
annotation data from the Reactome database of metabolic 
networks (41) to phosphoproteomics data in PRIDE (30) 
and somatic mutations in COSMIC (42), which allowed 
putting cancer-related mutation data into a functional 
context (43). 

We believe that establishing connections between data- 
bases is an important way of improving the databases 
themselves, providing the user with additional search 



tools and, more generally, creating a live ecosystem that 
stores and expands knowledge. Accordingly, we consider 
it essential that the databases featured in the NAR 
Database Issue do their best in creating links to outside 
resources and providing an easy and straightforward way 
for the authors of other databases to link to their database 
content. 

Last year, we published a paper by the BioDBcore 
Working Group that proposed creating a resource of 
'minimal information about a biological database', a 
community-defined, uniform, generic description of the 
core attributes of biological databases (44). Accordingly, 
submitters to this year's NAR Database Issue were asked 
to fill out a checklist of core attributes (available at http:// 
www.biodbcore.org) of their databases and provide it as 
supplementary material to their manuscripts. Most of the 
authors complied with this request, which resulted in a 
stand-alone resource that contains machine-readable de- 
scriptions of the databases featured in this issue and is 
available from the BioSharing website (http://biosharing 
.org/biodbcore). We hope that this effort would illuminate 
the scope and general features of every listed database 
resource, including the community standards that these 
systems support, forge better contacts between their 
authors, simplify linking various data sets, and, eventual- 
ly, bring greater clarity and integration to the whole field 
of molecular biology databases. 
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